85 research outputs found

    A Computational Approach for Identifying Plant-Based Foods for Addressing Vitamin Deficiency Diseases

    Get PDF
    Vitamins are nutrients that are essential to human health, and deficiencies have been shown to cause severe diseases. In this study, a computational approach was used to identify vitamin deficiency diseases and plant-based foods with vitamin content. Data from the United States Department of Agriculture Standard Reference (SR27), National Library of Medicine\u27s Medical Subject Headings and MEDLINE, and Wikipedia were combined to identify vitamin deficiency diseases and vitamin content of plant-based foods. A total of 41,584 vitamin-disease associations were identified from MEDLINE-indexed articles as well as from entries in Wikipedia. The SR27 identified 1912 foods that contained at least one vitamin, with an average of 1276 foods per vitamin. Vitamin B12 and D contained the fewest number of foods (n=135 and 70, respectively). The results of this study establish the foundation for developing a process to link vitamin deficiency diseases to vitamin-rich foods

    Grand challenges in biodiversity informatics

    Get PDF
    Author Posting. Ā© The Author, 2007. This is the author's version of the work. It is posted here by permission of KH Biotech Services Pte for personal use, not for redistribution. The definitive version was published in Asia-Pacific Biotech News 11(1): 15-18.The exponentially growing array of biological data has necessitated the development of a new information management domain, biodiversity informatics. It is one of the newest members of the ā€˜informaticsā€™ sub-disciplines, which all generally focus on the management of information through the application of advanced technologies. Like other informatics sub-disciplines, biodiversity informatics depends on fundamental computer science and information science principles to facilitate the management of heterogeneous data. Biodiversity informatics distinguishes itself as being the most focused on biological knowledge dating back to the earliest dates of recorded history ā€“ while most biological or biomedical informatics studies focus on organizing and studying information spanning less than 100 years, the scope of biodiversity informatics spans the age of the Earth. Biodiversity informatics is also concerned with the widest range of disparate data types ā€“ including climatology, epidemiology, geography, and taxonomy. To this end, many informatics principles can readily be incorporated into biodiversity informatics; however, there are equally as many challenges that will require creative solutions. Here, several such challenges are presented in an effort to lay a framework for the types of issues that will define the future of biodiversity informatics and, in turn, the future of biology and biomedicine

    Biodiversity informatics : organizing and linking information across the spectrum of life

    Get PDF
    This article has been accepted for publication in Briefings in Bioinformatics Ā© 2007 The Author Published by Oxford University Press. All rights reserved. This is a pre-print, electronic version of an article published in Briefings in Bioinformatics 8 (2007) 347-357, doi:10.1093/bib/bbm037Biological knowledge can be inferred from three major levels of information: molecules, organisms, and ecologies. Bioinformatics is an established field that has made significant advances in the development of systems and techniques to organize contemporary molecular data; biodiversity informatics is an emerging discipline that strives to develop methods to organize knowledge at the organismal level extending back to the earliest dates of recorded natural history. Furthermore, while bioinformatics studies generally focus on detailed examinations of key ā€œmodelā€ organisms, biodiversity informatics aims to develop over-arching hypotheses that span the entire tree of life. Biodiversity informatics is presented here as a discipline that unifies biological information from a range of contemporary and historical sources across the spectrum of life using organisms as the linking thread. The present review primarily focuses on the use of organism names as a universal meta-data element to link and integrate biodiversity data across a range of data sources

    Structural network analysis of biological networks for assessment of potential disease model organisms

    Get PDF
    AbstractModel organisms provide opportunities to design research experiments focused on disease-related processes (e.g., using genetically engineered populations that produce phenotypes of interest). For some diseases, there may be non-obvious model organisms that can help in the study of underlying disease factors. In this study, an approach is presented that leverages knowledge about human diseases and associated biological interactions networks to identify potential model organisms for a given disease category. The approach starts with the identification of functional and interaction patterns of diseases within genetic pathways. Next, these characteristic patterns are matched to interaction networks of candidate model organisms to identify similar subsystems that have characteristic patterns for diseases of interest. The quality of a candidate model organism is then determined by the degree to which the identified subsystems match genetic pathways from validated knowledge. The results of this study suggest that non-obvious model organisms may be identified through the proposed approach

    TaxonGrab: Extracting Taxonomic Names From Text

    Get PDF
    Identification of organism names in biological texts is essential for the management of archival resources to facilitate comparative biological investigation. Because organism nomenclature conforms closely to prescribed rules, automated techniques may be useful for identifying organism names from existing documents, and may also support the completion of comprehensive indices of taxonomic names; such comprehensive lists are not yet available. Using a combination of contextual rules and a language lexicon, we have developed a set of simple computational techniques for extracting taxonomic names from biological text. Our proposed method consistently performs at greater than 96% Precision and 94% Recall, and at a much higher speed than manual extraction techniques. An implementation of the described method is available as a Web based tool written in PHP. Additionally, the PHP source code is available from SourceForge: http://sourceforge.net/projects/taxongrab, and the project website is http://research.amnh.org/informatics/taxlit/apps/

    Identifying Phytochemicals from Biomedical Literature Utilizing Semantic Knowledge Sources

    Get PDF
    Chemicals derived from plants (phytochemicals) are major concepts of interest in the study of medicinal plants. To date, efforts to catalogue and organize phytochemical knowledge have resorted to manual approaches. This study explored the potential to leverage publicly accessible semantic knowledge sources for identifying possible phytochemicals. Within the context of this feasibility study, putative phytochemicals were identified for more than 4,000 plants from the Medical Subject Headings Supplementary Concept Records and the Semantic MEDLINE Database. An examination of phytochemicals identified for five selected plant species using the method developed here reveals that there is a disparity in electronically catalogued phytochemical knowledge compared to information from Dr. Dukeā€™s Phytochemical and Ethnobotanical Databases maintained by the United States Department of Agriculture. The results therefore suggest that semantic knowledge sources for biomedicine can be utilized as a source for identifying potential phytochemicals and thus contribute to the overall curation of plant phytochemical knowledge

    Exploring historical trends using taxonomic name metadata

    Get PDF
    Ā© 2008 Sarkar et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution License. The definitive version was published in BMC Evolutionary Biology 8 (2008): 144, doi:10.1186/1471-2148-8-144.Authority and year information have been attached to taxonomic names since Linnaean times. The systematic structure of taxonomic nomenclature facilitates the ability to develop tools that can be used to explore historical trends that may be associated with taxonomy. From the over 10.7 million taxonomic names that are part of the uBio system, approximately 3 million names were identified to have taxonomic authority information from the years 1750 to 2004. A pipe-delimited file was then generated, organized according to a Linnaean hierarchy and by years from 1750 to 2004, and imported into an Excel workbook. A series of macros were developed to create an Excel-based tool and a complementary Web site to explore the taxonomic data. A cursory and speculative analysis of the data reveals observable trends that may be attributable to significant events that are of both taxonomic (e.g., publishing of key monographs) and societal importance (e.g., world wars). The findings also help quantify the number of taxonomic descriptions that may be made available through digitization initiatives. Temporal organization of taxonomic data can be used to identify interesting biological epochs relative to historically significant events and ongoing efforts. We have developed an Excel workbook and complementary Web site that enables one to explore taxonomic trends for Linnaean taxonomic groupings, from Kingdoms to Families.The work presented here was funded in part by the MBLWHOI Library and the DAB Lindberg Research Fellowship from the Medical Library Association to INS

    Exploring Complex Disease Gene Relationships Using Simultaneous Analysis

    Get PDF
    The characterization of complex diseases remains a great challenge for biomedical researchers due to the myriad interactions of genetic and environmental factors. Adaptation of phylogenomic techniques to increasingly available genomic data provides an evolutionary perspective that may elucidate important unknown features of complex diseases. Here an automated method is presented that leverages publicly available genomic data and phylogenomic techniques. The approach is tested with nine genes implicated in the development of Alzheimer Disease, a complex neurodegenerative syndrome. The developed technique, which is an update to a previously described Perl script called ā€œASAP,ā€ was implemented through a suite of Ruby scripts entitled ā€œASAP2,ā€ first compiles a list of sequence-similarity based orthologues using PSI-BLAST and a recursive NCBI BLAST+ search strategy, then constructs maximum parsimony phylogenetic trees for each set of nucleotide and protein sequences, and calculates phylogenetic metrics (partitioned Bremer support values, combined branch scores, and Robinson-Foulds distance) to provide an empirical assessment of evolutionary conservation within a given genetic network. This study demonstrates the potential for using automated simultaneous phylogenetic analysis to uncover previously unknown relationships among disease-associated genes that may not have been apparent using traditional, single-gene methods. Furthermore, the results provide the first integrated evolutionary history of an Alzheimer Disease gene network and identify potentially important co-evolutionary clustering around components of oxidative stress pathways

    GenBank and PubMed : how connected are they?

    Get PDF
    Ā© 2009 Sarkar et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License. The definitive version was published in BMC Research Notes 2 (2009): 101, doi:10.1186/1756-0500-2-101.GenBank(R) is a public repository of all publicly available molecular sequence data from a range of sources. In addition to relevant metadata (e.g., sequence description, source organism and taxonomy), publication information is recorded in the GenBank data file. The identification of literature associated with a given molecular sequence may be an essential first step in developing research hypotheses. Although many of the publications associated with GenBank records may not be linked into or part of complementary literature databases (e.g., PubMed), GenBank records associated with literature indexed in Medline are identifiable as they contain PubMed identifiers (PMIDs). Here we show that an analysis of 87,116,501 GenBank sequence files reveals that 42% are associated with a publication or patent. Of these, 71% are associated with PMIDs, and can therefore be linked to a citation record in the PubMed database. The remaining (29%) of publication-associated GenBank entries either do not have PMIDs or cite a publication that is not currently indexed by PubMed. We also identify the journal titles that are linked through citations in the GenBank files to the largest number of sequences. Our analysis suggests that GenBank contains molecular sequences from a range of disciplines beyond biomedicine, the initial scope of PubMed. The findings thus suggest opportunities to develop mechanisms for integrating biological knowledge beyond the biomedical field.INS and HM are funded in part by a research grant from the Ellison Medical Foundation and National Library of Medicine award R01LM009725 to INS
    • ā€¦
    corecore